Time-frequency analysis of extreme-mass-ratio 



o 

^ : inspiral signals in mock LISA data 



Q i Jonathan R Gair^, Ilya Mandel^H and Linqing Wen^ 

^ Institute of Astronomy, Madingiey Road, CBS OHA, Cambridge, UK 
^ Theoretical Astrophysics, Cahfornia Institute of Technology, Pasadena, CA 91125 
* Max-Planck-Institut fiir Gravitationsphysik, Am Miihlenberg 1, 14476, Potsdam, Germany 



(N 



^ ' E-mail: jgair@ast.cam.ac.uk, i-mandel@northwestern.edu, lwen@aei.mpg.de 

cr: 

' Abstract. Extreme-mass-ratio inspirals (EMRIs) of compact objects with mass m ~ 1 — 

10 Mq into massive black holes with mass can serve as excellent probes of strong- 

field general relativity. The Laser Interferometer Space Antenna (LISA) is expected to detect 
gravitational wave signals from ~ 100 EMRIs per year, but the data analysis of EMRI signals 
^ ■ poses a unique set of challenges due to their long duration and the extensive parameter space 

' of possible signals. One possible approach is to carry out a search for EMRI tracks in the time- 

ly , frequency domain. We have applied a time-frequency search to the data from the Mock LISA 

CN ■ Data Challenge (MLDC) with promising results. Our analysis used the Hierarchical Algorithm 

'/^ I for Clusters and Ridges to identify tracks in the time-frequency spectrogram corresponding to 

EMRI sources. We then estimated the EMRI source parameters from these tracks. In these 
proceedings, we discuss the results of this analysis of the MLDC round 1.3 data. 

o 

> 

^ ' 1. Introduction 

^ . The cores of most galaxies are expected to contain massive black holes (MBHs) . Compact stellar- 

mass objects (white dwarfs, neutron stars, and stellar-mass black holes) may be captured by an 
MBH, and may gradually spiral into the MBH under the influence of radiation reaction from the 
emission of gravitational waves (see [1] and references therein for details). Such extreme-mass- 
ratio inspirals (EMRIs) constitute one of the main sources for the planned gravitational-wave 
detector LISA (Laser Interferometer Space Antenna) [2]. Rate predictions are uncertain, but 
tens to thousands of EMRIs may be observed during the lifetime of the LISA mission [1] . 

Detection of EMRIs will not be an easy task, however. The expected gravitational wave 
(GW) signals from EMRIs will be buried in instrumental noise and a foreground of galactic 
white-dwarf binaries. Using the notation of [3j, EMRIs can be parametrized by: the two masses 
M and m, the dimensionless spin parameter of the massive black hole S, the azimuthal frequency 
I'o and orbital eccentricity cq specified at some moment of time, the orbital inclination A, the 
source sky position angles (ps and 6s, the orientation of the spin axis of the MBH (pK and 6k, 
the distance to the source D, and three phase angles ^q, 70 and ao, which specify, respectively, 
the initial azimuthal orbital phase, periapsis precession phase, and phase of the orbital-plane 
precession. The large size of the EMRI parameter space, together with the long observation 
timescale (~ year), makes standard matched filtering impractical as a detection method. 
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Time-frequency searches are one possible alternative to coherent matched filtering. These 
have a much lower computational cost, albeit at the expense of lower detection sensitivity. Time- 
frequency methods consist of building a spectrogram of the signal by dividing it into shorter 
segments and performing a Fourier transform on these segments, then identifying possible tracks 
in the spectrogram, and finally estimating source parameters from these tracks. The pipeline 
we employ utilizes the Hierarchical Algorithm for Clusters and Ridges [4] to identify tracks 
by clustering bright pixels in binned spectrograms, followed by a simple parameter estimation 
routine that uses the track shape to infer six of fourteen EMRI parameters and the power 
variation over time to estimate the two sky position angles. 

We have tested the performance of this time-frequency search on the Round 1.3 Mock LISA 
Data Challenge (MLDC) [5j. Challenges 1.3.1, 1.3.2, 1.3.3, 1.3.4, and 1.3.5 each included a 
single EMRI signal embedded in noise, with signal-to-noise ratio (SNR) between 40 and 110 [6]. 
We attempted to analyze all five signals, and were successful in detecting EMRIs and estimating 
their parameters in the first four of these challenges. We discuss our track search methods in 
section [2] and our parameter estimation in section [3l the results are summarized in section U 
and possible future improvements are discussed in section [5l 

2. Track search methods 

The basis of our analysis of the data sets was the Hierarchical Algorithm for Clusters and Ridges 
(HACR). A detailed description of the method may be found in [4j. HACR searches for clusters 
in the time-frequency spectrogram by identifying particularly bright "black pixels" (with power 
above the threshold pup) and then counting the number of relatively bright "brown pixels" (with 
power above the threshold piow) forming a contiguous cluster connected to each black pixel. Any 
cluster containing more than A'^pix pixels is a candidate event. The HACR search also employs 
binning — we construct a sequence of binned spectrograms by using overlapping square bins 
of the form 2"* x 2"/ with nt,nf = 0,1,2,.... This technique was first employed in the Excess 
Power search described in [71[8l[9]. We used the Excess Power search in parallel with HACR as 
a check of the results. 

For the analysis of the MLDC Round 1.3 data, we tuned the thresholds of the HACR search 
for each challenge set using the procedure described in [3]. This process involves injection of 
a test waveform into a sequence of noise realizations. For the test waveform, we used the 
corresponding MLDC training source in each case. The HACR thresholds were then determined 
by specifying an overall rate of false alarms per LISA mission (FAP) of 10%. The training and 
challenge data sets (the LISA Simulator sets) were then analyzed as follows: 

• The release data was used to construct A, E and T data streams. 

• The A, E and T data sets were bandpassed to a frequency range appropriate to that 
particular source, and then whitened using the theoretical noise spectral densities. 

• The data was divided up into segments of 2^^ data points in length (approximately 
11.4 days). Each segment was FFT'd and thus a spectrogram of each data stream was 
constructed, with 64 points in time and 32768 points in frequency. For the 1.3.4 and 1.3.5 
releases, segment lengths of 2^^ and 2^^ data points were also used. 

• The A and E spectrograms were summed pointwise and searched using the appropriate 
tuned HACR thresholds for an overall search false alarm probability of 10%. 

• The identified clusters were cleaned to aid parameter extraction. This cleaning used 
percolation (setting a high threshold and lowering it gradually to see features appear) to 
identify the brightest parts of the clusters, presumed to be from source tracks. This was 
followed by applying piecewise linear filters, aligned along these tracks, to remove spurious 
noise pixels from the cluster. 



To test the HACR thresholds, we also constructed "noise only" data sets by subtracting the 
training signal from the MLDC training release and searching that data stream. In each case, 
the results were consistent with the imposed FAP. For challenges 1.3.1-1.3.4 we were able to 
identify several tracks in the challenge data set using HACR. In the challenge 1.3.5 data set, 
HACR identified a few pixels as interesting, but these could not be used for parameter extraction. 
This was consistent with the low stated prior on the source SNR G [40,60]. We estimated the 
detection probability of the corresponding training data set at only 85%. 



3. Parameter estimation methods 

The detection part of the search algorithm is fairly well developed. Parameter estimation from 
detected clusters and tracks is not so well advanced and is likely to improve significantly in the 
future. However, we describe the simple techniques that we have so far employed in this section. 

In the time- frequency domain, a typical EMRI signal is characterized by a sequence of tracks. 
These tracks correspond to harmonics of the azimuthal frequency, at frequencies nv -\- ^/tt {v 
is the Keplerian frequency and 7 is the correction due to periapsis precession), plus sidebands 
that arise from orbital-plane precession which has characteristic frequency d/(27r). The tracks 
appear in groups corresponding to a given value of n, since typically d. In general, we 
expect that the quadrupole (n = 2) harmonics will be brightest for low-eccentricity inspirals. 
The separation between the groups of harmonics at a given time gives the value of v at that 
time, while the separation between the members of a group gives an estimate of d/(27r). The 
value of the frequency of a track, /t, then gives 7, provided the harmonic is correctly identified 
(i.e., 7/7r = ft — nv — ka/{2iT), but n and k need to be known or guessed). The evolution of 
these frequencies depends on the source parameters. The track shape thus provides a constraint 
on the parameters. This idea was also discussed in |9]. The power fluctuation along a track 
arises from the motion of the LISA detector and thus constrains sky position. 



3.1. Sky position estimate 

The starting point for this estimate is the low frequency approximation to the LISA response 
given in [TO]. Making the simplifying assumption that the amplitudes of the two waveform 
polarizations are equal (A^ = ylx in the notation of [lOj), the detector motion induced 
modulation, A4{t), of the signal power, h'j + h'jj can be approximated by 

M it) oc (F+)2 + (F/ f + {F+f + (F/^)2 ocl + 6 cos^ 6 + cos^ 9 (1) 

where 



cos a = - cos as sm 9s cos — — 

2 * 2 ^ \ T 



in which our notation is consistent with [T^, and T = 1 year. 

To estimate the sky position, we took a 100 x 100 grid of values of {9s, (ps) £ ([0, vr/2], [0, 27r]). 
For each pair of values, we found via an automatic maximization the overall amplitude A that 
gave the best fit AM.{t) to the power profile along the track and computed the of the resulting 
fit. The best-fit sky position was the pair of parameters that minimized this x^- Since Eq. ([1]) is 
based on a low frequency approximation, there is a degeneracy between the point {9s, (ps) ™d 
the point (tt — 9s, 4>s + '^), and so we always chose the value of 9s that lay in the range [0, 7r/2]. 
We note also that the returned sky positions quoted in Table [T] are 7r/2 — 9s since 9s as used by 
Cutler [To] is the ecliptic colatitude not ecliptic latitude. 

For the Round 1.3 data sets, we used only the longest detected track for the sky position 
estimation. This could obviously be improved by using multiple tracks. For a single track, the 
sky position determination code runs in a fraction of a second. As an illustration of this method, 
we show in Figured] the power profile along a track, and the best fit sky modulation curve for 




Figure 1. The power profile along the longest detected track and the best-fit sky modulation 
curve for training data set 1.3.2. 



the longest track detected in an analysis of the 1.3.2 training data. The best fit parameters 
were 6s = 1-78 and (ps = 3.64. The actual sky position for that training set was 9s = 1.83, 
(ps = 3.62. In Table [T] we also see that we recovered sky position well in the challenge sets, 
despite the approximations that went into Eq. ([T]). 

3.2. Other intrinsic parameters 

As mentioned above, the tracks give measurements of the three fundamental frequencies, v, 7 
and a. These depend on the six intrinsic parameters that are not phase angles — M, S, m, 
A, Co and vq. Explicit expressions for 7 and a in terms of these parameters are given in [3]. 
Each measurement of a frequency gives a constraint on the six parameters but the number of 
independent measurements available from a set of tracks depends on the amount of frequency 
evolution of a source as well as on the length of track detected. To estimate parameters from 
the Round 1.3 data we attempted to find the 6 parameters that best matched the shape of 
the recovered tracks. We did not use the power in the tracks for anything other than sky 
position estimation. In principle the relative power in different v harmonics is a measure of the 
eccentricity, while the power in different a sidebands is a measure of the source orientation, 6}^ 
and (pK- The total power of a track constrains D. In the future we plan to use the track power 
to go after Ok, 4>k and D. A time- frequency analysis will never have sensitivity to the remaining 
three phase parameters. 

4. Results 

We summarize the time-frequency properties that we determined for each of the challenge sets 
1.3.1-1.3.4 in the following. Our recovered parameters and the true source parameters are 
compared in Table [TJ 



Challenge 




9S 


c 

D 


m/JVlQ 


71 yf / 71 /T 

IVi /Mq 


(mHz) 


eo 


A 


1.3.1 


Est. 


U.25 


0.31 


r\ no 

0.68 


9.5 


9930000 


0.1895 


0.183 


1.3 


Irue 


U.oUi 


O.olo 


O.Db4 


9.602 


9/5bl5( 


0.1900 


0.19o 


0.926 


Error 


U.Uo 


0.01 


2.4/(: 


i . 1 /o 


1.0 /o 


0.3/0 


/ .6/0 


40/0 


1.3.2 


Ej,I. 


U.-J 


U.ZZ 


U.u 






U.o/ 


U.ZU.j 


1.57 


True 


0.440 


0.198 


0.660 


10.23 


5194453 


0.3192 


0.262 


1.266 


Error 


0.06 


0.02 


9.1% 


6.7% 


0.9% 


0.3% 


3.4% 


24% 


1.3.3 


Est. 


0.42 


1.3 


0.53 


10.4 


5070000 


0.375 


0.164 


1.6 


True 


0.316 


1.263 


0.548 


10.42 


4872374 


0.3660 


0.209 


2.116 


Error 


0.10 


0.04 


3.3% 


0.2% 


4.1% 


2.5% 


21% 


26% 


1.3.4 


Est. 


0.2 


5.3 


0.58 


10.02 


952000 


0.842 


0.43 


1.91 


True 


-0.230 


5.040 


0.584 


10.31 


1021272 


0.8426 


0.402 


1.865 


Error 


0.43 


0.26 


0.7% 


2.8% 


6.8% 


0.1% 


7.0% 


2% 



Table 1. Comparison of recovered parameters with true source parameters. Rows containing 

our recovered parameters are labeled by "Est.", rows of true parameters are labeled "True" and 
"Error" rows give the absolute difference for the angles ^5 and ^5 and the fractional difference 
lAEst/Airue — 1| in percent for the other parameters. 

Challenge 1.3.1 We detected 5 frequency harmonics, two each for n = 2 and n = 3 plus one 
for n = 4. The dominant harmonics for n = 2 and n = 3 were detected throughout most 
of the observation. The total frequency change was only ~ 20 frequency bins over the whole 
observation, which meant we could not determine the compact object mass to better than 5%. 
We were thus left with a degeneracy between e, m and M. For the submitted results we fixed 
m = 9.5M0 and then used the detected tracks to determine the other parameters. 

Challenge 1.3.2 Three harmonics were detected — one each for n = 2,3,4. Since we didn't 
measure any sidebands, we had no sensitivity to the value of a. The evolution equations for v 
and e, and the value of 7/7r depend only the quantity S'cos A. We thus had some sensitivity to 
this combination, but not to S and cos A individually. Our results suggested 5 cos A ~ 0, and so 
for the submitted results we took S = 0.6, i.e., the median of the allowed range, and A = 7r/2. 

Challenge 1.3.3 Tracks were identified from three different harmonics — the dominant n = 2 
harmonic and a sideband, plus the dominant n = 3 harmonic. However, none of these 
harmonics was detected for the whole inspiral — the dominant n = 2 harmonic was detected 
for < t < 2.7 X lO'^'s, the n = 2 sideband was detected for 1.82 x lO'^'s < t < 2.6 x lO'^'s and the 
n = 3 harmonic for 1.62 x lO'^'s < t < 2.7 x lO'^'s. 

Challenge 1.3.4 In this case, we did not detect the signal near t = nor near the plunge. We 
did detect the brightest (assumed to be n = 2) harmonic, and one a sideband for 2.15 x lO^s 

< t < 4.3 x lO^s, and we detected the n = 3 harmonic plus one sideband for 2.15 x lO^s 

< i < 2.9 X lO^s. Due to the low mass of the central black hole, these harmonics evolved 
significantly over the observed track, which provided more information for parameter extraction. 
The complication was in identifying which harmonics we were observing. We took the sidebands 
to be = 0, — 1 and then did a minimization over parameter space to find the sources that 
best reproduced the detected tracks. 



5. Summary and future plans 

The time- frequency algorithm performed weh in the Round 1.3 analysis. The parameter recovery 
was good, although the exact parameter values are in some ways a poor measure of the algorithm 
performance. The method is better suited to determining combinations of parameters by 
measuring frequencies at certain times. To determine parameters, often several good constraints 
had to be combined with a poor one, leading to the appearance of errors in all the parameters. 
The errors in our parameter extraction thus hide the fact that we could measure combinations 
of parameters to high accuracy, which would be a useful input for follow up search algorithms. 
In the future, we hope to improve the technique in several ways. 

5.1. Track search improvements 

• The filtering algorithm described above will be formalized and automated. 

• We will use only the smaller bin sizes that are suitable for parameter extraction, rather 
then all possible bin sizes of the form 2"-* x 2"/. 

• Instead of binning, we will try running the search on multiple spectrograms constructed 
using different time segment lengths. Increasing the segment length increases the frequency 
resolution of the map, although this comes at a price: the source may evolve through several 
frequency bins in a single segment. The optimal bin sizes and segment length will therefore 

be parameter dependent. 

• The spectrogram construction will be improved. If the sky position is known, a sky-position 
optimized spectrogram can be constructed that gives more weight to ^ or £^ when the 
detector is favorably oriented for that sky location. Since we can estimate sky position 
quite well, it should be possible to do this construction. 

• Other possible improvements include using a higher specified FAP for the search and then 
rejecting noise clusters in post-processing, and binning using "linear chirp" bins rather than 
rectangular bins. 

Eventually, we must also prepare for more realistic data sets containing multiple EMRI tracks 
that might cross each other. In that case, our simple cluster cleaning technique will not work, 
and more sophisticated approaches will be needed to decide which of the multiple clusters should 
be associated with the same track. 

5.2. Parameter estimation improvements 

• Track power, not just track shape, will be utilized for parameter estimation. 

• Markov Chain Monte Carlo methods will be used to find the parameters that best recover 
the detected time-frequency tracks. This matched filtering of the recovered tracks will 
formalize the search method, and provide a way to quantify the errors in our parameter 

estimates. 

• We will follow up the t-f search with a matched filtering search to improve parameter 
estimates. 
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